[Feat] Low overhead #135

Essoz · 2026-01-06T23:06:07Z

PR Description: Low-Overhead Instrumentation Support

Summary

This PR introduces dynamic instrumentation policies (sampling and warm-up) to significantly reduce overhead for large-scale training runs. It refactors the instrumentation control logic to be injected directly into training and evaluation loops, ensuring robust state management and correct application of policies across different execution stages.

Key Changes

1. Loop-Based Instrumentation Control

New Module: Introduced traincheck.instrumentor.control with start_step() and start_eval_step() functions.
- start_step(): Increments the global training step counter and applies the configured policy (interval/warmup) to toggle instrumentation.
- start_eval_step(): Manages a separate eval_step counter for evaluation loops, reusing the global policy.
Decoupling: Moved policy enforcement logic out of the optimizer.step() wrapper. This prevents issues where instrumentation state could become desynchronized or incorrectly applied outside of loop contexts.

2. Smart AST Injection (Source Instrumentation)

Enhanced Visitor: Updated InsertTracerVisitor in traincheck/instrumentor/source_file.py to intelligently detect loop contexts:
- Training Loops: Identified by calls to optimizer.step() or loss.backward(). The visitor injects start_step().
- Evaluation Loops: Identified by context (e.g., inside functions named test, eval, valid). The visitor injects start_eval_step().
Automatic Injection: The appropriate control function is automatically injected at the start of the loop body.

3. CLI & Configuration Updates

traincheck-collect Arguments:
- Added --sampling-interval: Controls how frequently steps are instrumented (e.g., every Nth step).
- Added --warm-up-steps: Specifies the number of initial steps to always instrument, regardless of the sampling interval.
Dynamic Policy: Removed static schedule generation; policies are now evaluated dynamically at runtime, allowing for more flexibility.

4. Robustness Improvements

Stage Transitions: Updated annotate_stage to reset DISABLE_WRAPPER to False upon entering a new stage. This ensures instrumentation is re-enabled by default when switching contexts (e.g., from Training to Validation), preventing state leakage.

Verification

Unit Tests:
- Added tests/test_loop_injection.py to verify that AST transformations correctly identify loop types and inject the appropriate control calls.
- Updated tests/test_dynamic_policy.py to verify the runtime logic of start_step and policy application.
- Verified tests/test_policy_injection.py for CLI argument integration.
End-to-End Testing:
- Verified with mnist.py example. Confirmed that trace logs show expected "Interval step" (instrumented) and "Skipping step" (skipped) behavior for both training and testing loops.

…, to ensure submodule assignments are captured

Essoz added 16 commits January 4, 2026 00:25

[WIP] refactor: move proxy_wrapper to within the instrumentor

5fb1424

WIP: strengthen instrumentation logic

0c0ebf8

remove unused proxy config; rename ML_DAIKON to TRAINCHECK

18b6a8b

subclass selective instrumentation impl

730463d

fix instrumentation logic to get the parent class of a method definition

060872d

fix: only use positional arguments for function_wrapper

b14e0d4

fix: respect configured tracker type during selective instrumentation

2034dfd

fix: unify registry implementation for proxy and subclass

ac5202f

fix: step incrementing logic

3383275

add: richer error msg for unchanged var check in contain relation

0313757

fix: subclass registry updating process

d478e27

fix: remove unproxy scanning for subclass to further reduce overhead

6a9f694

add: refined logging for observer and registry

491ddb4

fix: selective dumping for the proxy class

ab95372

add: monkey patch __setattr__ at the module level when using subclass…

53d90ca

…, to ensure submodule assignments are captured

add: support sampling and warmup instrumentation policies

6406fc7

Essoz force-pushed the low-overhead branch from 07787ab to 6406fc7 Compare February 12, 2026 00:00

Essoz merged commit cdca52e into main Feb 12, 2026
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Low overhead #135

[Feat] Low overhead #135

Uh oh!

Essoz commented Jan 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Feat] Low overhead #135

[Feat] Low overhead #135

Uh oh!

Conversation

Essoz commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description: Low-Overhead Instrumentation Support

Summary

Key Changes

1. Loop-Based Instrumentation Control

2. Smart AST Injection (Source Instrumentation)

3. CLI & Configuration Updates

4. Robustness Improvements

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Essoz commented Jan 6, 2026 •

edited

Loading